Sentiment Lexicon Interpolation and Polarity Estimation of Objective and Out-Of-Vocabulary Words to Improve Sentiment Classification on Microblogging
نویسندگان
چکیده
Sentiment analysis has become an important classification task because a large amount of user-generated content is published over the Internet. Sentiment lexicons have been used successfully to classify the sentiment of user review datasets. More recently, microblogging services such as Twitter have become a popular data source in the domain of sentiment analysis. However, analyzing sentiments on tweets is still difficult because tweets are very short and contain slang, informal expressions, emoticons, mistyping and many words not found in a dictionary. In addition, more than 90 percent of the words in public sentiment lexicons, such as SentiWordNet, are objective words, which are often considered less important in a classification module. In this paper, we introduce a hybrid approach that incorporates sentiment lexicons into a machine learning approach to improve sentiment classification in tweets. We automatically construct an Add-on lexicon that compiles the polarity scores of objective words and out-ofvocabulary (OOV) words from tweet corpora. We also introduce a novel feature weighting method by interpolating sentiment lexicon score into uni-gram vectors in the Support Vector Machine (SVM). Results of our experiment show that our method is effective and significantly improves the sentiment classification accuracy compared to a baseline unigram model.
منابع مشابه
Lexicon-Based Sentiment Analysis on Topical Chinese Microblog Messages
Microblogging is a popular social media where people express their opinions and sentiment on social topics. The Chinese microblogging service, called Weibo, has become a remarkable media in the Chinese society. People are eager to know others’ attitudes towards social events, thus sentiment analysis on those topical microblog messages is important. In this paper we introduce a lexicon-based sen...
متن کاملA Supervised Method for Constructing Sentiment Lexicon in Persian Language
Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...
متن کاملUsing objective words in the reviews to improve the colloquial arabic sentiment analysis
One of the main difficulties in sentiment analysis of the Arabic language is the presence of the colloquialism. In this paper, we examine the effect of using objective words in conjunction with sentimental words on sentiment classification for the colloquial Arabic reviews, specifically Jordanian colloquial reviews. The reviews often include both sentimental and objective words; however, the mo...
متن کاملSentiment Analysis Based on Expanded Aspect and Polarity-Ambiguous Word Lexicon
This paper focuses on the task of disambiguating polarity-ambiguous words and the task is reduced to sentiment classification of aspects, which we refer to sentiment expectation instead of semantic orientation widely used in previous researches. Polarity-ambiguous words refer to words like” large, small, high, low ”, which pose a challenging task on sentiment analysis. In order to disambiguate ...
متن کاملA Supervised Approach for Sentiment Analysis using Skipgrams
We present a supervised hybrid approach for Sentiment Analysis in Twitter. A sentiment lexicon is built from a dataset, where each tweet is labelled with its overall polarity. In this work, skipgrams are used as information units (in addition to words and n-grams) to enrich the sentiment lexicon with combinations of words that are not adjacent in the text. This lexicon is employed in conjunctio...
متن کامل